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Abstract 

Meaning cannot be based on dictionary def- 
initions all the way down: at some point 
the circularity of definitions must be bro- 
ken in some way, by grounding the mean- 
ings of certain words in sensorimotor cat- 
egories learned from experience or shaped 
by evolution. This is the "symbol ground- 
ing problem". We introduce the concept of 
a reachable set — a larger vocabulary whose 
meanings can be learned from a smaller vo- 
cabulary through definition alone, as long as 
the meanings of the smaller vocabulary are 
themselves already grounded. We provide 
simple algorithms to compute reachable sets 
for any given dictionary. 

1 Introduction 

We know from the 19th century philosopher- 
mathematician Frege that the referent and the mean- 
ing (or "sense") of a word (or phrase) are not the same 
thing: two different words or phrases can refer to the 
very same object without having the same meaning 
(Frege, 1948): "George W. Bush" and "the current 
president of the United States of America" have the 
same referent but a different meaning. So do "human 
females" and "daughters". And "things that are big- 
ger than a breadbox" and "things that are not the size 
of a breadbox or smaller". 

A word's "extension" is the set of things to which it 
refers, and its "intension" is the rule for defining what 
things fall within its extension.. A word's meaning is 



hence something closer to a rule for picking out its 
referent. Is the dictionary definition of a word, then, 
its meaning? 

Clearly, if we do not know the meaning of a word, 
we look up its definition in a dictionary. But what if 
we do not know the meaning of any of the words in its 
dictionary definition? And what if we don't know the 
meanings of the words in the definitions of the words 
defining those words, and so on? This is a problem of 
infinite regress, called the "symbol grounding prob- 
lem" (Harnad, 1990; Harnad, 2003): the meanings of 
words in dictionary definitions are, in and of them- 
selves, ungrounded. The meanings of some of the 
words, at least, have to be grounded by some means 
other than dictionary definition look-up. 

How are word meanings grounded? Almost cer- 
tainly in the sensorimotor capacity to pick out their 
referents (Harnad, 2005). Knowing what to do with 
what is not a matter of definition but of adaptive sen- 
sorimotor interaction between autonomous, behav- 
ing systems and categories of "objects" (including 
individuals, kinds, events, actions, traits and states). 
Our embodied sensorimotor systems can also be de- 
scribed as applying information processing rules to 
inputs in order to generate the right outputs, just as 
a thermostat defending a temperature of 20 degrees 
can be. But this dynamic process is in no useful way 
analogous to looking up a definition in a dictionary. 

We will not be discussing sensorimotor ground- 
ing (Barsalou, 2008; Glenberg & Robertson, 2002; 
Steels, 2007) in this paper. We will assume some 
sort of grounding as given: when we consult a dictio- 
nary, we already know the meanings of at least some 



words, somehow. A natural first hypothesis is that the 
grounding words ought to be more concrete, refer- 
ring to things that are closer to our overt sensorimo- 
tor experience, and learned earlier, but that remains 
to be tested (Clark, 2003). Apart from the question of 
the boundary conditions of grounding, however, there 
are basic questions to be asked about the structure of 
word meanings in dictionary definition space. 

In the path from a word, to the definition of that 
word, to the definition of the words in the definition 
of that word, and so on, through what sort of a struc- 
ture are we navigating (Ravasz & Barabasi, 2003; 
Steyvers & Tenenbaum, 2005)? Meaning is compo- 
sitional: A definition is composed of words, com- 
bined according to syntactic rules to form a propo- 
sition (with a truth value: true or false). For example, 
the word to be defined w (the "definiendum") might 
mean wiSzW2Sz ... & w„, where the are other 
words (the "definientes") in its definition. Rarely 
does that proposition provide the full necessary and 
sufficient conditions for identifying the referent of the 
word, w, but the approximation must at least be close 
enough to allow most people, armed with the defi- 
nition, to understand and use the defined word most 
of the time, possibly after looking up a few of its 
definientes dw, but without having to cycle through 
the entire dictionary, and without falUng into circu- 
larity or infinite regress. 

If enough of the definientes are grounded, then 
there is no problem of infinite regress. But we can still 
ask the question: What is the size of the grounding 
vocabulary? and what words does it contain? What is 
the length and shape of the path that would be taken 
in a recursive definitional search, from a word, to its 
definition, to the definition of the words in its defini- 
tion, and so on? Would it eventually cycle through the 
entire dictionary? Or would there be disjoint subsets? 

This paper raises more questions than it answers, 
but it develops the formal groundwork for a new 
means of finding the answers to questions about how 
word meaning is explicitly represented in real dictio- 
naries — and perhaps also about how it is implicitly 
represented in the "mental lexicon" that each of us 
has in our brain (Hauk et al., 2008). 

The remainder of this paper is organized as fol- 
lows: In Section 2, we introduce the graph-theoretical 
definitions and notations used for formulating the 
symbol grounding problem in Section 3. Sections 4 
and 5 deal with the implication of this approach in 
cognitive sciences and show in what ways grounding 
kernels may be useful. 

2 Definitions and Notations 

In this section, we give mathematical definitions for 
the dictionary-related terminology, relate them to nat- 
ural language dictionaries and supply the pertinent 
graph theoretical definitions. Additional details are 



given to ensure mutual comprehensibility to special- 
ists in the three disciplines involved (mathematics, 
linguistics and psychology). Complete introductions 
to graph theory and discrete mathematics are pro- 
vided in (Bondy & Murty, 1978; Rosen, 2007). 

2.1 Relations and Functions 

Let A be any set. A binary relation on A is any subset 
R of Ax A. We write xRy if {x,y) G R. The relation 
R is said to be (1) reflexive if for all x e ^, we have 
xRx, (2) symmetric if for all x,y € A such that xRy, 
we have yRx and (3) transitive if for all x,y,z £ A 
such that xRy and yRz, we have xRz. The relation R 
is an equivalence relation if it is reflexive, symmetric 
and transitive. For any a; G ^, the equivalence class 
of X, designated by [x], is given by [a;] = {y G A | 
xRy}. It is easy to show that [x] = [y] if and only if 
xRy and that the set of all equivalence classes forms 
a partition of A. 

Let A be any set, / : A — > A a function and k a 
positive integer. We designate by f'^ the function / o 
fo...of{k times), where o denotes the composition 
of functions. 

2.2 Dictionaries 

At its most basic level, a dictionary is a set of associ- 
ated pairs: a word and its definition, along with some 
disambiguating parameters. The word^ to be defined, 
w, is called the definiendum (plural: definienda) 
while the finite nonempty set of words that defines 
w, dyj, is called the set of definientes of w (singular: 
definiens). 

Each dictionary entry accordingly consists of a 
definiendum w followed by its set of definientes 
dyj. A dictionary D then consists of a finite set 
of pairs (w, d^) where w is a word and = 
{wi,W2, ■ . . , Wn}, where n > 1, is its definition, 
satisfying the property that for all {w,dw) € D and 
for all d e d^, there exists {w' , du,r) E D such that 
d = w'. A pair {w, d^) is called an entry of D. In 
other words, a dictionary is a finite set of words, each 
of which is defined, and each of its defining words is 
likewise defined somewhere in the dictionary. 

2.3 Graphs 

A directed graph is a pair G = {V,E) such that V 
is a finite set of vertices and EC. y x y is a finite 
set of arcs. Given V C V, the subgraph induced 
by V, designated by G[V'], is the graph G[V'] = 
{V, E') where E' = Er]{V'x V). For any vcV, 
N~ {v) and (v) designate, respectively, the set of 
incoming and outgoing neighbors of v, i.e. 

N-{v) = {u€V \{u,v)€ E} 
N+{v) = {u€V \ {v,ii) e E}. 

'in the context of this mathematical analysis, we will 
use "word" to mean a finite string of uninterrupted letters 
having some associated meaning. 



We write deg~{v) = \N~{v)\ and deg (v) = 
\N~^{v)\, respectively. A path of G is a sequence 
(ui,U2, ••• ,^^n). where nis apositive integer, e V 
for i = l,2,...,n and {vi,Vi+i) € E, for i = 
l,2,...,n — 1. A uv-path is a path starting with u 
and ending with v. Finally, we say that a ui)-path is a 
cycle \f u = V. 

Given a directed graph G = {V,E) and u,v G 
we write u — > u if there exists a uu-path in G. We 
define a relation ^ as 

u ~ u w — > t; and v ^ u. 

It is an easy exercise to show that ~ is an equivalence 
relation. The equivalence classes of V with respect to 
^ are called the strongly connected components of G. 
In other words, in a directed graph, it might be pos- 
sible to go directly from point A to point B, without 
being able to get back from point B to point A (as in 
a city with only one-way streets). Strongly connected 
components, however, are subgraphs in which when- 
ever it is possible to go from point A to point B, it is 
also possible to come back from point B to point A 
(the way back may be different). 

There is a very natural way of representing defi- 
nitional relations using graph theory, thus providing a 
formal tool for analyzing grounding properties of dic- 
tionaries: words can be represented as vertices, with 
arcs representing definitional relations, i.e. there is 
an arc (u, v) between two words u and v if the word 
u appears in the definition of the word v. More for- 
mally, for every dictionary D, its associated graph 
G = {V,E)is given by 

V = {w \ 3dw such that {w, dw) <= D}, 
E = {{v, w) I 3dw such that {w, d^) G D and 
V e d^}. 

Note that every vertex i; of G satisfies degQ{v) > 0, 
but it is possible to have degQ{v) = 0. In other 
words, whereas every word has a definition, some 
words are not used in any definition. 

Example 1. Let D be the dictionary whose defini- 
tions are given in Table 1. Note that every word ap- 
pearing in some definition is likewise defined in D 
(this is one of the criteria for D to be a dictionary). 
The associated graph G ofD is represented in Figure 
1. Note that {not, good, eatable, fruit) is a path of G 
while {good, bad, good) is a cycle (as well as a path) 
ofG. 

3 A Graph-Theoretical Formulation of 
the Problem 

We are now ready to formulate the symbol grounding 
problem from a mathematical point of view. 
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apple 


red fruit 


bad 


not good 


banana 


yellow fruit 


color 


dark or light 


dark 


not light 


eatable 


good 


fruit 


eatable thing 


good 


not bad 


light 


not dark 


not 


not 


or 


or 


red 


dark color 


thing 


thing 


tomato 


red fruit 


yeUow 


fight color 







Table 1: Definitions of the dictionary D 
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dark 



Figure 1: Graph representation of the dictionary D. 

3.1 Reachable and Grounding Sets 

Given a dictionary D of n words and a person x who 
knows m out of these n words, assume that the only 
way X can learn new words is by consulting the dic- 
tionary definitions. Can all n words be learned by x 
through dictionary look-up alone? If not, then exactly 
what subset of words can be learned by x through dic- 
tionary look-up alone? 

For this purpose, let G = {V,E) be a directed 
graph and consider the following application, where 
2^ denotes the collection of aU subsets of V: 

Rq, 2^ ^ 2^ 

U I — ^ UV^{v&V\N-{v)CU}. 

When the context is clear, we omit the subscript G. 

Also we let denote the h^^ power of R. We say 
that V G y is k-reachable from U if v £ R'^{U) 
and fc is a nonnegative integer. It is easy to show that 
there exists an integer k such that R^{U) = R^{U), 
for every integer £ > k. More precisely, we have the 
following definitions: 

Definition 2. Let G = {V,E)bea directed graph, U 

a subset ofV, and k an integer such that R^{U) ~ 
R''{U) for all i > k. The set R''{U) is called 
the reachable set from U and is denoted by R*{U). 
Moreover, if R*{U) = V, then we say that U is a 
grounding set ofG. 



We say that G is p-groundable if there exists U C 
V such that \U\ — p and J7 is a grounding set of G. 
The grounding number of a graph G is the smallest 
integer p such that G is p-groundable. 

Reachable sets can be computed very simply using 
a breadth-first-search type algorithm, as shown by Al- 
gorithm 1. 

Algorithm 1 Computing reachable sets 
1: function ReachableSet(G, [/) 
2: R^U 
3: repeat 

4: S^{v^V\N^{v)CR}-R 
5: R^RUS 
6: until S ^$ 
7: return R 
8: end function 



We now present some examples of reachable sets 
and grounding sets. 

Example 3. Consider the dictionary D and the graph 
G of Example 1. Let U = {bad, light, not, thing}. 
Note that 

R°iU) = U 

R^{U) = U \J {dark, good}, 
R^{U) = R^{U)[J {eatable} 
R^{U) = i?2(C/) u {fruit} 
R^{U) = R%U) 

so that R*(ll) = {bad, dark, eatable, fruit, good, 
light, not, thing} (see Figure 2). In particular, this 
means that the word "eatable" is 2-reachable (but 
not I- reachable) from U and all words in U are 0- 
reachable from U. Moreover, we observe that U is 
not a grounding set ofG ("color", for example, is un- 
reachable). On the other hand, the set U' = UU{or} 
is a grounding set of G, so that G is 5-groundable. 

3.2 The Minimum Grounding Set Problem 

Given a dictionary and its associated graph G, we are 
interested in finding minimum grounding sets of G. 
(Note that in general, there is more than one ground- 
ing set of minimum cardinality.) This is related to a 
natural decision problem; we designate by fc-GS the 
problem of deciding whether G is fc-groundable. We 
show that fc-GS is closely related to the problem of 
finding minimum feedback vertex sets. First, we re- 
call the definition of a feedback vertex set. 

Definition 4. Let G — {V,E) be a directed graph 
and U a subset of V. We say that U is a feedback 
vertex set of G if for every cycle G of G, we have 
U n G ^ In other words, U covers every cycle of 
G. 

The minimum feedback vertex set problem is the 
problem of finding a feedback vertex set of G of min- 
imum cardinality. To show that feedback vertex sets 




Figure 2: The set R*(U) (the words in squares) ob- 
tained from U 

and grounding sets are the same, we begin by stating 
two simple lemmas. 

Lemma 5. Let G — {V, E) be a directed graph, G a 
cycle of G and U ^ V a grounding set of G. Then 

[/ n G 7^ 0. 

Proof. By contradiction, assume that UOG — 9 and, 
for all V £ G, there exists an integer k such that v be- 
longs to R^(U). Let I be the smallest index in the set 
{fc I 3u G G such that u e R''{U)}. Let u be a ver- 
tex in G n R^{U) and w the predecessor of u in G. 
Since U O G — %, k must be greater than and w a 
member of R^~^{U), contradicting the minimality of 

e. □ 

Lemma 6. Every directed acyclic graph G is 0- 
groundable. 

Proof. We prove the statement by induction on \ V\. 

Basis. If \V\ = 1, then \E\ = 0, so that the only 
vertex u of G satisfies Nq{v) — 0. Hence -R(0) = 
V. 

Induction. Let w be a vertex such that deg'^(t;) = 
0. Such a vertex exists since G is acyclic. Moreover, 
let G' be the (acyclic) graph obtained from G by re- 
moving vertex v and all its incident arcs. By the in- 
duction hypothesis, there exists an integer fc such that 
i?^, (0) = y - {v}. Therefore, V - {v} C R%{%) so 
thati?^+i(0) = F. □ 

The next theorem follows easily from Lemmas 5 
and 6. 

Theorem 7. Let G = {V, E) be a directed graph and 
[/ C y. Then U is a grounding set of G if and only if 
U is a feedback vertex set ofG. 

Proof. Let G be a cycle of G. By Lemma 5, 
[/ n G 7^ 0, so that [/ is a minimum feedback vertex 
set of G. (<=) Let G' be the graph obtained from G by 



removing U. Then G' is acyclic and is a grounding 
set of G". Therefore, J7 U = C/ is a grounding set of 
G. □ 

Corollary 8. k-GS is NP-complete. 

Proof. Denote by fc-FVS the problem of deciding 
whether a directed graph G admits a feedback vertex 
set of cardinality at most k. This problem is known to 
be NP-complete and has been widely studied (Karp, 
1972; Garey & Johnson, 1979). It follows directly 
from Theorem 7 that A:-GS is NP-complete as well 
since the problems are equivalent. □ 

The fact that problems fc-GS and k-FVS are equiv- 
alent is not very surprising. Indeed, roughly speaking, 
the minimum grounding problem consists of finding 
a minimum set large enough to enable the reader to 
learn (reach) all the words of the dictionary. On the 
other hand, the minimum feedback vertex set prob- 
lem consists of finding a minimum set large enough 
to break the circularity of the definitions in the dictio- 
nary. Hence, the problems are the same, even if they 
are stated differently. 

Although the problem is NP-complete in general, 
we show that there is a simple way of reducing 
the complexity of the problem by considering the 
strongly connected components. 

3.3 Decomposing the Problem 

Let G — {V,E)hesi directed graph and Gi, G2, . . ., 
Gm the subgraphs induced by its strongly connected 
components, where to > 1. In particular, there are no 
cycles of G containing vertices in different strongly 
connected components. Since the minimum ground- 
ing set problem is equivalent to the minimum feed- 
back vertex set problem, this means that when seek- 
ing a minimum grounding set of G, we can restrict 
ourselves to seeking minimum grounding sets of Gi, 
for I = 1,2, ... ,m. More precisely, we have the fol- 
lowing proposition. 

Proposition 9. Let G = {V, E) be a directed graph 
with TO strongly connected components, with m > 1, 
and let Gi = (Vi, Ei) be the subgraph induced by its 
i-th strongly connected component, where I < i < 
TO. Moreover, let Ui be a minimum grounding set of 
Gi, for i = 1,2, ... ,m. Then U = Ul^i 
minimum grounding set of G. 

Proof. First, we show that [/ is a grounding set of 
G. Let G be a cycle of G. Then C is completely 
contained in some strongly connected component of 
G, say Gj, where 1 < j < m. But Uj C {/ is a 
grounding set of Gj, therefore J7j n G 7^ so that 
[/ n G 7^ 0. It remains to show that [/ is a minimum 
grounding set of G. By contradiction, assume that 
there exists a grounding set U' of G, with \ U'\ < \U\ 
and let U^ = U' OVi. Then there exists an index j. 



with 1 < J < TO, such that \ Uj\ < \Uj\, contradicting 
the minimality of | JX,- 1 . □ 

Note that this proposition may be very useful for 
graphs having many small strongly connected com- 
ponents. Indeed, by using Tarjan's Algorithm (Tar- 
jan, 1972), the strongly connected components can be 
computed in linear time. We illustrate this reduction 
by an example. 

Example 10. Consider again the dictionary D and 
the graph G of Example 1. The strongly connected 
components of G are encircled in Figure 3 and 
minimum grounding sets (represented by words in 
squares) for each of them are easily found. Thus the 
grounding number of G is 5. 




Figure 3: The strongly connected components and a 
minimum grounding set of G 

3.4 The Grounding Kernel 

In Example 10, we have seen that there exist some 
strongly connected components consisting of only 
one vertex without any loop. In particular, there ex- 
ist vertices with no successor, i.e. vertices v such 
that Nq{v) = 0. For instance, this is the case of 
the words "apple", "banana" and "tomato", which are 
not used in any definition in the dictionary. Remov- 
ing these three words, we notice that "fruit", "red" 
and "yellow" are in the same situation and they can 
be removed as well. Pursuing the same idea, we can 
now remove the words "color" and "eatable". At this 
point, we cannot remove any further words. The set 
of remaining words is called the grounding kernel of 
the graph G. More formally, we have the following 
definition.. 

Definition 11. Let D be a dictionary, G = {V,E) 
its associated graph and Gi = {Vi,Ei), G2 = 
{V2,E2), Gm = {Vm,E.m) the Subgraphs in- 
duced by the strongly connected components of G, 
where m > 1. Let V' be the set of vertices u such 
that {u\ is a strongly connected component without 
any loop (i.e., {u,u) is not an arc of G). For any 



u, let N* (u) denote the set of vertices v such that 
G contains a uv-path. Then the grounding kernel 
of G, denoted by Kq, is the set V — {u \ u G 
V andN*{u) C V'}. 

Clearly, every dictionary D admits a grounding 
kernel, as shown by Algorithm 2. Moreover, the 



Algorithm 2 Computing the grounding kernel 
1: function GroundingKernel(G) 
2: G' ^G 
3: repeat 

4: Let W be the set of vertices of G' 

5: U {v & W \N+{v)=%} 

6: G' '^G'lW -U] 

1: until U = % 
8: return G" 
9: end function 



grounding kernel is a grounding set of its associated 
graph G and every minimum grounding set of G is a 
subset of the grounding kernel. Therefore, in study- 
ing the symbol grounding problem in dictionaries, we 
can restrict ourselves to the grounding kemel of the 
graph G corresponding to D. This phenomenon is 
interesting because every dictionary contains many 
words that can be recursively removed without com- 
promising the understanding of the other definitions. 
Formally, this property relates to the level of a word: 
we will say of a word w that it is of level k if it is 
fc-reachable from Kq but not ^-reachable from Kq, 
for any £ < k. In particular, level indicates that 
the word is part of the grounding kernel. A similar 
concept has been studied in (Changizi, 2008). 

Example 12. Continuing Example 10 and from what 
we have seen so far, it follows that the grounding ker- 
nel of G is given by 

Kg = {bad, dark, good, light, not, or, thing}. 

Level 1 words are "color" and "eatable", level 2 
words are "fruit", "red" and "yellow", and level 3 
words are "apple", "banana" and "tomato". 

4 Grounding Sets and the Mental 
Lexicon 

In Section 3, we introduced all the necessary termi- 
nology to study the symbol grounding problem using 
graph theory and digital dictionaries. In this section, 
we explain how this model can be usefiil and on what 
assumptions it is based. 

A dictionary is a formal symbol system. The pre- 
ceding section showed how formal methods can be 
applied to this system in order to extract formal fea- 
tures. In cognitive science, this is the basis of com- 
putationalism (or cognitivism or "disembodied cog- 
nition" (Pylyshyn, 1984)), according to which cogni- 
tion, too, is a formal symbol system - one that can 



be studied and explained independently of the hard- 
ware (or, insofar as it concerns humans, the wetware) 
on which it is implemented. However, pure computa- 
tionalism is vulnerable to the problem of the ground- 
ing of symbols too (Harnad, 1990). Some of this can 
be remedied by the competing paradigm of embod- 
ied cognition (Barsalou, 2008; Glenberg & Robert- 
son, 2002; Steels, 2007), which draws on dynamical 
(noncomputational) systems theory to ground cogni- 
tion in sensorimotor experience. Although compu- 
tationalism and symbol grounding provide the back- 
ground context for our investigations and findings, 
the present paper does not favor any particular theory 
of mental representation of meaning. 

A dictionary is a symbol system that relates words 
to words in such a way that the meanings of the 
definienda are conveyed via the definientes. The user 
is intended to arrive at an understanding of an un- 
known word through an understanding of its defini- 
tion. What was formally demonstrated in Section 3 
agrees with common sense: although one can learn 
new word meanings from a dictionary, the entire dic- 
tionary cannot be learned in this way because of cir- 
cular references in the definitions {cycles, in graph 
theoretic terminology). Information - nonverbal in- 
formation - must come from outside the system to 
ground at least some of its symbols by some means 
other than just formal definition (Cangelosi & Har- 
nad, 2001). For humans, the two options are leamed 
sensorimotor grounding and irmate grounding. (Al- 
though the latter is no doubt important, our current 
focus is more on the former.) 

The need for information from outside the dictio- 
nary is formalized in Section 3. Apart from confirm- 
ing the need for such external grounding, we take a 
symmetric stance: In natural language, some word 
meanings — especially highly abstract ones, such as 
those of mathematical or philosophical terms — are 
not or cannot be acquired through direct sensorimo- 
tor grounding. They are acquired through the com- 
position of previously known words. The meaning 
of some of those words, or of the words in their re- 
spective definitions, must in turn have been grounded 
through direct sensorimotor experience. 

To state this in another way: Meaning is not just 
formal definitions all the way down; nor is it just sen- 
sorimotor experience all the way up. The two extreme 
poles of that continuum are sensorimotor induction 
at one pole (trial and error experience with corrective 
feedback; observation, pointing, gestures, imitation, 
etc.), and symbolic instruction (definitions, descrip- 
tions, explanation, verbal examples etc.) at the other 
pole. Being able to identify from their lexicological 
structure which words were acquired one way or the 
other would provide us with important clues about 
the cognitive processes underlying language and the 
mental representation of meaning. 



To compare the word meanings acquired via senso- 
rimotor induction with word meanings acquired via 
symbolic instruction (definitions), we first need ac- 
cess to the encoding of that knowledge. In this com- 
ponent of om research, our hypothesis is that the rep- 
resentational structure of word meanings in dictionar- 
ies shares some commonalities with the representa- 
tional structure of word meanings in the human brain 
(Hauk et al., 2008). We are thus trying to extract from 
dictionaries the grounding kernel (and eventually a 
minimum grounding set, which in general is a proper 
subset of this kernel), from which the rest of the dic- 
tionary can be reached through definitions alone. We 
hypothesize that this kernel, identified through for- 
mal structural analysis, will exhibit properties that are 
also reflected in the mental lexicon. In parallel on- 
going studies, we are finding that the words in the 
grounding kernel are indeed (1) more frequent in oral 
and written usage, (2) more concrete, (3) more readily 
imageable, and (4) learned earlier or at a younger age. 
We also expect they will be (5) more universal (across 
dictionaries, languages and cultures) (Chicoisne et 
al., 2008). 

5 Grounding Kernels in Natural 
Language Dictionaries 

In earUer research (Clark, 2003), we have been ana- 
lyzing two special dictionaries: the Longman's Dic- 
tionary of Contemporary English (LDOCE) (Procter, 
1978) and the Cambridge International Dictionary of 
English (CIDE) (Procter, 1995). Bofli are officially 
described as being based upon a defining vocabulary: 
a set of 2000 words which are purportedly the only 
words used in all the definitions of the dictionary, in- 
cluding the definitions of the defining vocabulary it- 
self. A closer analysis of this defining vocabulary, 
however, has revealed that it is not always faithful to 
these constraints: A significant number of words used 
in the definitions turn out not to be in the defining vo- 
cabulary. Hence it became evident that we would our- 
selves have to generate a grounding kernel (roughly 
equivalent to the defining vocabulary) from these dic- 
tionaries. 

The method presented in this paper makes it pos- 
sible, given the graph structure of a dictionary, to ex- 
tract a grounding kernel therefrom. Extracting this 
structure in turn confronts us with two further prob- 
lems: morphology and polysemy. Neither of these 
problems has a definite algorithmic solution. Mor- 
phology can be treated through stemming and asso- 
ciated look-up lists for the simplest cases {i.e., was 
— > to be, and children — > child), but more elaborate 
or complicated cases would require syntactic analysis 
or, ultimately, human evaluation. Polysemy is usually 
treated through statistical analysis of the word con- 
text (as in Latent Semantic Analysis) (Kintsch, 2007) 
or human evaluation. Indeed, a good deal of back- 



ground knowledge is necessary to analyse an entry 
such as: ''dominant: the fifth note of a musical scale 
of eight notes" (the LDOCE notes 16 different mean- 
ings of scale and 4 for dominant, and in our example, 
none of these words are used with their most frequent 
meaning). 

Correct disambiguation of a dictionary is time- 
consuming work, as the most effective way to do it 
for now is through consensus among human evalua- 
tors. Fortunately, a fully disambiguated version of the 
WordNet database (Fellbaum, 1998; Fellbaum, 2005) 
has just become available. We expect the grounding 
kernel of WordNet to be of greater interest than the 
defining vocabulary of either CIDE or LDOCE (or 
what we extract from them and disambiguate auto- 
matically, and imperfectly) for our analysis. 

6 Future Work 

The main purpose of this paper was to introduce a for- 
mal approach to the symbol grounding problem based 
on the computational analysis of digital dictionaries. 
Ongoing and future work includes the following: 

The minimum grounding set problem. We have 
seen that the problem of finding a minimum ground- 
ing set is NP-complete for general graphs. However, 
graphs associated with dictionaries have a very spe- 
cific structure. We intend to describe a class of graphs 
including those specific graphs and to try to design 
a polynomial-time algorithm to solve the problem. 
Another approach is to design approximation algo- 
rithms, yielding a solution close to the optimal solu- 
tion, with some known guarantee. 

Grounding sets satisfying particular constraints. 
Let D be a dictionary, G = {V,E) its associated 
graph, and U Q V any subset of vertices satisfying 
a given property P. We can use Algorithm 1 to test 
whether or not U is a grounding set. In particular, it 
would be interesting to test different sets U satisfying 
different cognitive constraints. 

Relaxing the grounding conditions. In this paper 
we imposed strong conditions on the learning of new 
words: One must know all the words of the definition 
fully in order to learn a new word from them. This 
is not realistic, because we all know one can often 
understand a definition without knowing every sin- 
gle word in it. Hence one way to relax these condi- 
tions would be to modify the learning rule so that one 
need only understand at least r% of the definition, 
where r is some number between and 100. An- 
other variation would be to assign weights to words to 
take into account their morphosyntactic and seman- 
tic properties (rather than just treating them as an un- 
ordered list, as in the present analysis). Finally, we 
could consider "quasi-grounding sets", whose asso- 
ciated reachable set consists of r% of the whole dic- 
tionary. 

Disambiguation of definitional relations. Analyz- 



ing real dictionaries raises, in its full generality, the 
problem of word and text disambiguation in free text; 
this is a very difficult problem. For example, if the 
word "make" appears in a definition, we do not know 
which of its many senses is intended — nor even what 
its grammatical category is. To our knowledge, the 
only available dictionary that endeavors to provide 
fully disambiguated definitions is the just-released 
version of WordNet. On the other hand, dictionary 
definitions have a very specific grammatical structure, 
presumably simpler and more limited than the gen- 
eral case of free text. It might hence be feasible to 
develop automatic disambiguation algorithms specif- 
ically dedicated to the special case of dictionary defi- 
nitions. 

Concluding Remark: Definition can reach the 
sense (sometimes), but only the senses can reach the 
referent. 

Research funded by Canada Research Chair in 
Cognitive Sciences, SSHRC ( S. Harnad)and NSERC 
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